NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures

https://doi.org/10.3390/biom12070908

Alam, Fardina Fathmiul; Shehu, Amarda (July 2022, Biomolecules)

With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a protein molecule continues to be an outstanding challenge in computational structural biology. In tandem with methods formulated under the umbrella of stochastic optimization, we are now seeing rapid advances in the capabilities of methods based on deep learning. In recent work, we advance the capability of these models to learn from experimentally-available tertiary structures of protein molecules of varying lengths. In this work, we elucidate the important role of the composition of the training dataset on the neural network’s ability to learn key local and distal patterns in tertiary structures. To make such patterns visible to the network, we utilize a contact map-based representation of protein tertiary structure. We show interesting relationships between data size, quality, and composition on the ability of latent variable models to learn key patterns of tertiary structure. In addition, we present a disentangled latent variable model which improves upon the state-of-the-art variable autoencoder-based model in key, physically-realistic structural patterns. We believe this work opens up further avenues of research on deep learning-based models for computing multi-structure views of protein molecules.
more » « less
Full Text Available
Generating Physically-Realistic Tertiary Protein Structures with Deep Latent Variable Models Learning Over Experimentally-available Structures

https://doi.org/10.1109/BIBM52615.2021.9669584

Alam, Fardina Fathmiul; Shehu, Amarda (December 2021, International Conference on Bioinformatics and Biomedicine (BIBM))

Full Text Available
Towards more equitable question answering systems: How much more data do you need?

https://doi.org/10.18653/v1/2021.acl-short.79

Debnath, Arnab; Rajabi, Navid; Alam, Fardina Fathmiul; Anastasopoulos, Antonios (August 2021, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers))

Full Text Available
Variational Autoencoders for Protein Structure Prediction

https://doi.org/10.1145/3388440.3412471

Alam, Fardina Fathmiul; Shehu, Amarda (September 2020, Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics)
null (Ed.)
Full Text Available
From Unsupervised Multi-Instance Learning to Identification of Near-Native Protein Structures

https://doi.org/10.29007/pjcf

Alam, Fardina; Shehu, Amarda (March 2020, EPiC Series in Computing)

A major challenge in computational biology regards recognizing one or more biologically- active/native tertiary protein structures among thousands of physically-realistic structures generated via template-free protein structure prediction algorithms. Clustering structures based on structural similarity remains a popular approach. However, clustering orga- nizes structures into groups and does not directly provide a mechanism to select individual structures for prediction. In this paper, we provide a few algorithms for this selection prob- lem. We approach the problem under unsupervised multi-instance learning and address it in three stages, first organizing structures into bags, identifying relevant bags, and then drawing individual structures/instances from these bags. We present both non-parametric and parametric algorithms for drawing individual instances. In the latter, parameters are trained over training data and evaluated over testing data via rigorous metrics.
more » « less
Full Text Available
Deep Latent-Variable Models for Controllable Molecule Generation

https://doi.org/10.1109/BIBM52615.2021.9669692

Du, Yuanqi; Wang, Yinkai; Alam, Fardina; Lu, Yuanjie; Guo, Xiaojie; Zhao, Liang; Shehu, Amarda (January 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM))

Representation learning via deep generative models is opening a new avenue for small molecule generation in silico. Linking chemical and biological space remains a key challenge. In this paper, we debut a graph-based variational autoencoder framework to address this challenge under the umbrella of disentangled representation learning. The framework permits several inductive biases that connect the learned latent factors to molecular properties. Evaluation on diverse benchmark datasets shows that the resulting models are powerful and open up an exciting line of research on controllable molecule generation in support of cheminformatics, drug discovery, and other application settings.
more » « less
Full Text Available
Learning Reduced Latent Representations of Protein Structure Data

https://doi.org/10.1145/3307339.3343866

Alam, Fardina Fathmiul; Rahman, Taseef; Shehu, Amarda (September 2019, Comput Struct Biol Workshop (CSBW) - ACM BCB Workshops)

Full Text Available

Search for: All records